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Abstract — This paper examines the problem of estimating the 
parameters of a bandlimited signal from samples corrupted by 
random jitter (timing noise) and additive lid Gaussian noise, 
where the signal lies in the span of a finite basis. For the 
presented classical estimation problem, the Cramer-Rao lower 
bound (CRB) is computed, and an Expectation-Maximization 
(EM) algorithm approximating the maximum likelihood (ML) 
estimator is developed. Simulations are performed to study 
the convergence properties of the EM algorithm and compare 
the performance both against the CRB and a basic linear 
estimator. These simulations demonstrate that by post-processing 
the jittered samples with the proposed EM algorithm, greater 
jitter can be tolerated, potentially reducing on-chip ADC power 
consumption substantially. 

Index Terms — analog-to-digital converters, Cramer-Rao 
bound, EM algorithm, jitter, maximum likelihood estimator, 
sampling, timing noise. 



I. Introduction 

An analog-to-digital converter (ADC) processes a real signal 
x{t) to generate a sequence of observations (samples) {yn] at 
times {tn}'. 



ADC 



Vn = [s{t)*x{t\ 



W„ 



(1) 



where s{t) is the sampling prefilter and u)„ is an additive 
noise term that lumps together quantization, thermal noise, and 
other effects. For standard sampling applications, it is assumed 
that the sample times are uniformly spaced by some period 
Ts (tn = nTs, n G Z), where the period is small enough 
that the total bandwidth of x{t) is less than the sampling rate 
Fs = l/Ts- Jitter {z„}, also known as timing noise, perturbs 
the sample times: 

tn = nTs + Zn- (2) 

This paper focuses on the mitigation of random jitter in a non- 
Bayesian estimation framework. A simplified block diagram 
for the overall system is illustrated in Figure [1] 

The generally-accepted practice is to design clocks with 
low enough phase noise that the effect of jitter is negligible. 
The maximum allowable jitter is set such that the effect of 
jitter on a sinusoid of maximum frequency and maximum 
amplitude is at most one-half the least significant bit level IT]. 
While making jitter negligible obviates the mitigation of 
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Fig. 1. Abstract block diagram of an ADC with off-chip post-processing. 
The signal x{t) is filtered by the samphng prefilter s{t) and sampled at time 
tn- These samples are corrupted by additive noise w„ to yield tjn. The post- 
processor estimates the parameters x of x{t) using the vector of N samples 
y from the ADC. 



jitter, this may not be possible or desirable from an overall 
system design perspective, because requiring jitter to have 
a negligible effect may mandate high power consumption 
in the clock circuitry. (A model relating jitter and power is 
given below.) Technological trends suggest that eventually it 
will be worthwhile to allow nontrivial jitter and compensate 
through digital post-processing: the digital portions of mixed- 
signal systems like sensors and wireless transceivers continue 
to shrink, so the analog portions of such systems, including 
the ADC and its clock generator, dominate the size and 
power consumption of these chips. The abiUty to use more 
power-efficient analog circuitry would enable substantial new 
capabilities in diverse applications like implantable medical 
devices and remote sensors. One motivation of our study is 
to contribute to understanding the trade-off between accuracy 
in analog circuitry vs. complexity of off-chip digital post- 
processing of samples. 

The power consumed by a typical ADC design is approx- 
imately proportional to the desired accuracy and sampling 
rate ||2l, so lower-power circuitry would produce clock signals 
with more jitter. Specifically, it is shown in Q that 



Power oc Speed x (Accuracy (rms))^ 



(3) 



Furthermore, the analyses in H and Q suggest that in the 
large-jitter domain, every doubling of the standard deviation 
of the jitter reduces the effective number of bits (ENOB = 
log2 accuracy (rms)) by one. Thus, pre-compensating for the 
expected jitter in the design requires increasing the power 
consumption of the ADC by a factor of four for every doubling 
of the jitter standard deviation (e.g. by adding an additional 
level of comparators). In this paper, we instead propose block 
post-processing the jittered samples using classical estimation 
techniques off-chip. In addition to mitigating random jitter, 
this work may also be adapted to compensate for frequency 
modulated and spread-spectrum clocks, which produce lower 
EMI and radiation ||6j|. Note that this block post-processing 



method is intended to be performed off-chip, where power 
consumption of an implementation of the algorithm is not 
important. 

A. Problem Formulation 

While more sophisticated signal models may be more 
appropriate for some applications, we concern ourselves with 
a signal that lies in the span of a finite basis {hk{t)}, with 
K basis functions. We further restrict the basis to be uniform 
shifts of a single smooth bandlimited function h{t): 



hkit) = h{t-kT). 



(4) 



Denote the unknown weighting parameters x^; there are K of 
them. The signal x{t) then equals 



x{t) 



K-l 

E 

fc=0 



Xkh{t — fcT). 



(5) 



Without loss of generality, assume that T equals the critical 
sampling period of x{t), and assume this period is unity. 
In this work, the signal parameters {xk}^.SQ are unknown 
deterministic quantities. 

While there are many possible choices of h{t), when in 
need of a specific example in this work, we choose the sine 



sin(7rt) 



This basis satisfies 



interpolating function sinc(i) = 
x{k) — Xk, for all k = 0, . . . , K — 1. In general, we choose 
h{t) to be appropriate for the class of input signals we wish to 
sample; we choose the sine function because bandlimitedness 
is a common assumption in the context of signal processing. 
It is sufficient, but not necessary, in this work that h{t) be 
analytic and bounded. 

When sampling the signal x{t), we will assume that the 
sampling prefilter is an ideal anti-aliasing filter with bandwidth 
Fs, so [s{t) * x{t)]t=tri = x{tn) for appropriately bandlimited 
inputs. The signal's critical sampling period is assumed to 
be one, but to accommodate oversampling by a factor of M 
into our model, the ideal sample times are spaced 1/AI time 
units apart. We acquire N jittered samples with additive noise, 
yo,...,yN^i, at this rate: 



Hn = x{n/M + z„) + Wn- 



(6) 



In this paper, we will assume that the jitter and additive noise 
are iid zero-mean Gaussian, with known variances equal to cr^ 
and (7^, respectively. We assume that these variances can be 
measured reasonably accurately through in-factory calibration, 
although we expect the variances to vary naturally over time 
due to environmental effects. 

Combining the signal and observation models yields 



Vn 



K-1 

E 

fc=0 



h{n/M + Zn — k)xk + w„ 



(7) 



This relationship can be expressed as a semilinear system of 
equations: 

y = H(z)x + w, (8) 



where y = [yo,...,yN- 
[zo,...,zn~iY, and w = 



[wo,...,wn-i_ 



^. For notational 



convenience, denote the nth (zero-indexed) row of H(z) by 

KiZn). 

To keep notation compact, denote the probability density 
function (pdf) of a by p{a), the pdf of a parameterized 
by the nonrandom vector b by p(a; b), and the pdf of a 
conditioned on the random variable c by p(a | c). The pdf is 
made explicit using subscripts only when necessary to avoid 
ambiguity. Expectations are written similarly. Also, denote the 
d-dimensional multivariate normal distribution by 

7V(a;/x,A) = \2ttA\-'^^^ exp {-^{a- fif A-^a- ^i)} . 

(9) 
The primary objective of classical (non-Bayesian) estima- 
tion is to derive an estimator x that minimizes a desired cost 
function C(x;x) of unknown nonrandom parameters x. One 
such cost function is the mean squared error (MSE): 



C(x;x)-Ey[||x(y)-x||2], 



(10) 



where Ey[-] is the expectation with respect to y, x(y) is the 
estimate of the unknown parameters x based on the samples y, 
and the observations y are implicitly a function of x. However, 
except in certain cases, computing the estimate that minimizes 
this cost function, which is called the minimum MSE (MMSE) 
estimator, is impossible without prior knowledge about x. 
Instead, it is essential to derive an estimator that relies only 
on the observation model and the actual observations y. One 
such estimator is the maximum likelihood (ML) estimator, 
which maximizes the likelihood function ^(x;y) = p(y;x). 
The likelihood function corresponding to the signal parameter 
observation model in (O is 

£(x; y) = y AA(y; H(z)x, a^ I)A/-(z; 0, a^I) dz. (11) 

Using the assumptions that the jitter and additive noise are 
iid, and the fact that the nth row of H(z) only depends 
on one z„, the multivariate normal distributions in ( fTTI ) 
are separable over z. Thus, p{y; x) is also separable, with 
p(y;x) = YinPiyn'i^}' and the likelihood function is the 
product of N univariate integrals 

N-l „ 

^(x;y)= Yl JV{yn;hl{zn):>i,al)Af{zn;0,al)dzn. 

(12) 
Given the likelihood function, parameters M, a1, and crj, 
MSE cost function, and observations y, the goal of this work 
is use ML estimation to tolerate more jitter when estimating x. 
Thus, the bulk of this paper is concerned with the evaluation 
of this likelihood function and the problem of maximizing it. 

B. Related Work 

The problem of mitigating jitter has been investigated since 
the early days of signal processing. The effects of jitter 
on the statistics of samples of a deterministic (nonrandom) 
bandlimited signal are briefly discussed in |7|; this work also 
is concerned with stochastic signals and proposes an optimal 
linear reconstruction filter for the stochastic case. Much more 
work on analyzing the error and reconstructing stochastic 
signals from jittered samples can be found in fS) and (J9]- 
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However, the analysis of jittered samples of deterministic 
signals appears to be much more limited in the early literature. 

When the sample times are irregularly spaced, but known, 
the problem greatly simplifies. Efficient techniques, as well 
as a mention of prior work, can be found in fTOl . When the 
sample times are unknown, but belong to a known finite set, 
the jitter mitigation problem becomes a combinatorial one; 
|[TT1 describes geometric and algebraic solutions to this prob- 
lem of reconstructing discrete-time signals. Two block-based 
reconstruction methods for this finite location-set problem are 
described in lfT2l . 

However, when the set is infinitely large, or when the jitter 
is described by a continuous random distribution as seen 
here, a different approach is necessary. One contribution of 
this work is an Expectation-Maximization (EM) algorithm; 
in a similar context, 1731 develops a similar EM algorithm 
for the related problem of mitigating unknown phase offsets 
between component ADCs in a time-interleaved ADC system. 
Some of the results summarized in this paper are described in 
greater detail in (T4\, which also provides further background 
material. 

C. Outline 

In Section [III numerical integration using Gauss quadrature 
and iteration using the EM algorithm are discussed. Sectionlllll 
presents and derives the Cramer-Rao lower bound (CRB) on 
the MSE for this estimation problem. Sections |IV] and [V] 
derive linear and ML estimators for the jitter mitigation 
problem; simulations comparing these estimators are discussed 
in Section |VT] In conclusion, the results and contributions are 
summarized, and future research directions are introduced. 

II. Background 

Except for certain limited choices for h{t), the expression 
for the likelihood function in ( fT2] ) has no closed form; how- 
ever, various techniques exist to approximate it. One such 
powerful and general technique is that of quadrature, which 
refers to the method of approximating an integral with a finite 
weighted summation. The trapezoidal and Simpson's rules 
are elementary examples of quadrature. In particular, due to 
the normal distribution assumption on the jitter 2„, Gauss- 
Hermite quadrature is a natural choice of quadrature rule. 
Gauss-Legendre quadrature and Romberg's method are also 
discussed below. 

Computational problems also occur when deriving the ML 
estimator, due to the nonconcave and high-dimensional nature 
of the likelihood function. One local approximation technique 
called the EM algorithm can be used to locate local maxima in 
a computationally-feasible manner The EM algorithm is also 
introduced in this section. 

A. Numerical Integration 

Consider approximating the integral J f{x)w{x) dx using 
the summation X]i=i ^j/(^j)' where Xj and Wj are fixed 
abscissas (sampling locations) and weights. This type of 
approximation is known generally as quadrature. When the 



abscissas are uniformly spaced, the summation is known as 
interpolatory quadrature; the trapezoidal and Simpson's rules, 
as well as Romberg's method 1:151 , are of this type. Gauss 
quadrature seeks greater accuracy for a given number of 
function evaluations by allowing the abscissas to be spaced 
nonuniformly. An appropriate choice of abscissas and weights 
(called a rule) can be precomputed for a choice of w{x) 
and J using a variety of methods, including a very efficient 
eigenvalue-based method derived in lfT6l . Orthogonal poly- 
nomials satisfy a three-term recursive relationship, which is 
used to form a tri-diagonal matrix, whose eigenvalues are 
the abscissas, and whose eigenvectors yield the weights. The 
eigendecomposition of a tri-diagonal matrix is very efficient, 
so quadrature rules are very inexpensive to compute, even for 
very large J. Quadrature is particularly attractive when f{x) 
is smooth and has bounded derivatives. This method can be 
applied to multivariate integration as well, although in the 
absence of separability, the complexity scales exponentially 
with the number of variables. 

One weighting function of particular interest in this work 
is the pdf of the normal distribution. For a standard normal 
distribution, the associated form of quadrature is known as 
Gauss-Hermite quadrature, since the abscissas and weights 
derive from Hermite polynomials. Using elementary changes 
of variables, this method can be generalized to normal distri- 
butions with arbitrary mean /i and variance a^: 



f{x)J\f{x; n, CT^) dx 



^Wjfiaxj 



m), 



(13) 



where Wj and Xj are the weights and abscissas for Gauss- 
Hermite quadrature with a standard normal weighting function. 
As mentioned in ifTTI . the approximation error for Gauss- 
Hermite quadrature is bounded by the function's derivatives: 



f{x)J\f{x; ^, o-^) dx ~y^ Wif{axj + fi) 



J 



< - — — - max 
- (2J)! X 



(14) 

As long as f{x) is sufficiently smooth, the (2J)! term in the 
denominator dominates the above expression for large J, and 
the approximation error goes to zero superexponentially fast. 
While general conditions for convergence are difficult to iso- 
late for arbitrary f{x), a sufficient condition for convergence 
mentioned in |17| is that 

lim in.MJ < 1^ for some p > 0. (15) 

a:— ^cxj e^ 

Many other Gauss quadrature rules exist; one simple rule 
also considered is called Gauss-Legendre quadrature and is 
defined for integrating over a finite interval [a,b], with the 
weighting function w{x) = 1: 



fix) dx 



J 



WjfiXj 



(16) 



The abscissas and weights for Gauss-Legendre quadrature can 
be computed using the eigenvalue-based method mentioned 
above. Gauss-Legendre quadrature and other rules defined 



f(2J)c 



over a finite interval, including interpolatory quadrature meth- 
ods like Simpson's rule and Romberg's method, can be ex- 
tended to the infinite support case by re-mapping the variable 
of integration: 



f{x)w{x) dx 



7r/2 



7r/2 



/(tan(2/))w(tan(7;)) sec^(y) dy. 

(17) 



When applied to the Gauss-Legendre quadrature rule, the new 
rule becomes 



f{x)w{x) dx 



J 



</(<■), 



(18) 



where x'j — tan(a;j), and w'^ — w{x'j){l + {xj)^)wj. 

To compare the effectiveness of these different quadrature- 
based methods for numerical integration, Gauss-Hermite 
quadrature and Gauss-Legendre quadrature are contrasted 
against two more general methods, Simpson's rule and 
Romberg's method, by comparing each method against the 
marginal likelihood function p(y„;x), for a fixed, but ran- 
domly chosen, value of x. The marginal likelihood function is 
calculated from the empirical distribution of samples generated 
by the observation model in Q. As shown in Figure|2l Gauss- 
Legendre quadrature approximates the likelihood function well 
when (Jz is relatively large, but when ct^ and a^^ are both small, 
Gauss-Hermite quadrature is much more effective. However, 
other quadrature rules may be more accurate for different 
choices of signal basis functions {hk{t)}. 



B. EM Algorithm 

The EM algorithm was introduced in ifTSll : a classic ap- 
plication of this algorithm is ML estimation in the presence 
of incomplete data. Consider the problem of maximizing the 
likelihood function i{x;y), where y depends on some latent 
random variables z. The observations y are described as 
the incomplete data. We augment this incomplete data with 
some subset of latent (hidden) variables to form the complete 
data. The underlying assumption of the EM algorithm is that 
knowledge of the complete data makes the ML estimation 
problem easier to solve. 

The EM algorithm consists of repeatedly maximizing the 
function 



(x;x(-i)) 



= E 



logp(y,z;x) I y;x 



(i-i) 



(19) 



with respect to the desired parameters x; the maximizing 
value becomes x^*^ which is used in the next iteration. As 
long as the original likelihood function is bounded above, 
and some other mild conditions are satisfied, this algorithm is 
guaranteed to converge to a local maximum of the likelihood 
function lITSl . 

Much has been written about the convergence rate of EM al- 
gorithms. In 1 19 1, the rate of convergence of the EM algorithm 
is related to the difference in the CRB using the incomplete 
data and the CRB using the complete data (incomplete data 
+ latent variables). The supplemented EM algorithm in 1201 
also obtains Fisher information estimates, conditioned on the 
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(a) K = 10, M = 4, o-z = 0.75, o-» = 0.1, J = 129, n = 32. 



- Gauss-Hermite quad. 
Romberg's mettnod 

- Simpson's ruie 

- Gauss-Legendre quad. 




V 



(b) K = 10, M = 4, cr^ = 0.01, o-„ = 0.01, J = 129, n = 34. 

Fig. 2. The quadrature approximation to p{yn\'x) is computed for a fixed, 
but randomly chosen, x on a dense grid of y„ and is compared against a 
histogram of 100 000 samples j/,i generated using (TJ. Two cases are shown to 
illustrate the approximation quality of [(a)] Gauss-Legendre quadrature and |(b)| 
Gauss-Hermite quadrature; the worst-case n is chosen for each of these 
approximations. Note: the choice of J = 129 is used instead of J = 100 
because Romberg's method evaluates the function J = 2^ -|- 1 times for j 
iterations. 



observations y, which can be used to evaluate the quality of 
the resulting approximation to the ML estimate. 

Since the likelihood function in ( fTTI ) is not in general strictly 
concave, the presence of many critical points is a potential 
problem for any local algorithm. Simulated annealing II2T1I and 
other methods can be combined with the EM algorithm to 
improve robustness to getting trapped in local extrema. 

III. Approximating the CRB 

Minimizing the MSE without access to prior information 
about the unknown parameters x may be impossible in the 
general case. However, even in situations when the MMSE 
estimator cannot be computed, the Cramer-Rao lower bound 
on the minimum achievable MSE by an unbiased estimator 
may be straightforward to compute. The CRBy(x), where x 
is the variable to estimate from observations y, is defined to 
be the trace of the inverse of the Fisher information matrix 
Iy(x), which is defined as 



Iy(x)=E 



aiog£(x;y)\ /(91og£(x;y) 



9x 



dy 



(20) 



Assuming the likelihood function satisfies the regularity con- 
dition 

a2^(x;y)l _ p dH{x-y) 



E 



1 



€(x; y) 9x9x^ 



9x9x^ 



■dy-0, (21) 
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the Fisher information matrix can be expressed in terms of the Since ^[logp(2/„;x)] is equal to , ^ ,, a^ ^^ combin- 
ing dSOl ) and dSTl ) yields a complicated approximation to the 



Hessian of the log-Ukelihood: 

"aiog^(x;y)aiog^(x;y) 



[Iy(x)],. 



= E 
= E 

= E 



dxi 



dxk 



1 9£(x;y)9^(x;y) 



£(x; y)2 dxj dxk 

1 a£(x;y)9£(x;y) 
£(x; y)2 axj dxk 



(22) 



(23) 



expression inside the expectation in 



91ogp(y„;x)\ /91ogp(y„;x) 



9x 



9x 



1 92^(x;y) 



_9_ 

dxk 



-1 g^(x;y) 
^(x; y) dxj 



^(x;y) dxjdxk 
(24) 

(25) 



' EjCi ^'jlj/n - h^(^j)x)h„(zj)A/"(yn; h^(zj-)x, cr.g,) \ 



(32) 



Iy(x) = -E 



a2log^(x;y) 



9x9x^ 



(26) '^^'^ convenience, denote the above approximation F„(j/„;x). 
Now, consider computing the Fisher information matrix 

A sufficient condition for the regularity condition in (l2Tl i is PP 

that differentiation and integration interchange, since 'Iz} 

Iy(x)« ^E[F„(2;„;x)]. (33) 



9x9x^ 



dy = 



d' 



9x9x^ 



^(x;y)dy 



d' 



9x9x 



t(1) = 0- 

To compute this expectation, a numerical method is needed 



(27) again. The expectation is with respect to the distribution 

By basic analysis, uniform convergence of the integral in p(y„;x), which is approximated in dH with a Gaussian 

likelihood function in ^ impUes that the above regularity mixture, so Monte Carlo sampling is a convenient method 

condition holds. jq approximate this expectation. Generating S samples y„ ^ 

Since the Ukelihood function is separable, the log-likelihood f^^^ ^he Gaussian mixture Y^Ui '^J^iv^ ; K i^] )^.<^l) and 

function can be expressed as the summation of marginal log- averaging the corresponding function values F„(2/„,,; x), the 
Ukehhood functions; i.e. 

log£(x;y) = ^ logp(2/„;x) 



N-l 



(28) 



n=0 



which means that ( |26l ) can be rewritten as (again, assuming 
regularity conditions are satisfied) 



Fisher information matrix can be computed as 

N-l S 

lyW^^EE^" (?/«.« ;x). (34) 

n=0 s=l 

Once this matrix is computed and inverted, the trace gives the 

Cramer-Rao lower bound for that choice of parameter x. Due 

to matrix inversion, to ensure an accurate CRB estimate, we 

use J — 1000 for the quadratures in (|32] |. 

How much does the CRB decrease when z is assumed 

given? Comparing the CRBy(x) against the CRBy z(x) of the 

_, •,,,-/ X , , • ,, jitter-augmented data will be important later when analyzing 

The marginal pdt b(w„:x) can be computed numerically .. t-a/t i -.u a ■ t-u t- u ■ c ^- . ■ ■ 

. ^ ^^^ ' ' ^ ■' the EM algorithm design. The Fisher information matrix in 

this case is equal to 

AT-l 



N-l 



Iy(x) = ^ E 



aiogp(y„;x)\ /(91ogp(2/„;x 



dy 



dy 



(29) 



using quadrature: 

J 

p(y„;x) « ^u;j-7V(2/„;h^(zj)x,cr^), 



(30) 



Iy,z(x) 



J=i 



E: 

n=0 



9Mogp(2/„,z„;x) 



dxdx^ 



(35) 



where Zj and Wj are the abscissas and weights for the chosen course, since 

quadrature rule. As depicted in Figure |2] Gauss-Hermite 

quadrature is a good choice for small a^, and Gauss-Legendre 

quadrature is more accurate for larger ct^ (> 0.1). For all 

simulations in this paper, we use J = 100 unless otherwise 

specified. 

The derivative of the marginal pdf can be approximated 
similarly: 



logp(yn,2«;x) = — ^(y„-h^(z„)x)2 + — ^z^+constant, 
''- '^^ (36) 

(37) 



and the Hessian matrix with respect to x is 

log p{yn,z„;x) = -^-h„(z„)h^(z„), 



^p(y«;x) 
9x 



(2/„ -h^(z„)x)h„(z„) 



the jitter-augmented Fisher information matrix is 

N-l 



(j:. 



^fiyn;K{zn)^,al)^f{zn;0,al)dJnJ^) = ^ ^ E [h„(z„)h^(z„)] , (38) 



2^ ^^ ^2 



i=i 



a. 



w 



A/'(y„' h^(z )x cr^ ) where the expectation can be approximated numerically us- 
ing quadrature or Monte Carlo approximation. The jitter- 
(31) augmented CRByz(x) is the trace of the inverse of this matrix. 



We will return to the question of the difference of the two 
CRBs later in Section |Vll after we discuss ML estimation 
using an EM algorithm. 

IV. Linear Estimation 

In this paper, an estimator is said to be linear if it is a linear 
function of the observations; such an estimator has the form 

XL(y) = Ay, (39) 

where the matrix A is fixed^ 

For the semilinear observation model in ^, a linear estima- 
tor is unbiased if and only if AE[H(z)] = I. Since ]E[H(z)] 
is a tall matrix, assuming it has full column rank, one possible 
linear unbiased estimator is 



XL(y)=E[H(z)]ty, 



(40) 



where E[H(z)]t ^ (E[H(z)]'^E[H(z)]) E[H(z)]^ is the 
left pseudoinverse. This estimator is only one such linear 
unbiased estimator; more generally, any matrix that lies in the 
nullspace of E[H(z)] can be added to the pseudoinverse and 
yield an unbiased estimator 

The question then remains of how to obtain the best linear 
unbiased estimator (BLUE), in the MMSE sense. In the 
context of a simple linear observation model y = Hx + w, 
with Gaussian noise w, the BLUE is elementary to find 
(see II22I '). and it is also the ML and efficient minimum 
variance unbiased estimator (MVUE). If we choose z = 
to be deterministic (no jitter) in the observation model, the 
corresponding BLUE/efficient estimator would be 



eff|z 



.o(y) = (h(o)^H(o; 



Hiofy- 



(41) 



The performance of this estimator when applied to the proper 
(jittered) observation model will be used as one baseline for 
MSE improvement for the proposed estimators. 

As derived previously in fT4|, the BLUE for the semilinear 
model (O is 

XBLUE(y) = (E[H(z)]^AyiE[H(z)])"'E[H(z)]^Ay V, 

(42) 
where the covariance matrix of the data Ay depends on the 
value of the parameters: 



all. 



Ay = E[H(z)xx^H(z)^] - E[H(z)]xx'^E[H(z) 

(43) 
The BLUE estimator, in general, is not a valid estimator, since 
it depends on the true value of the unknown x. Two sufficient 
conditions for the estimator to be valid are: Ay is a scalar 
matrix, in which case, the covariance matrix commutes across 
multiplication, or Ay does not depend on x. Since Zm and 
Zn are independent for m ^ n, off-diagonal elements of Ay 
are zero. For the covariance matrix to be a scalar matrix that 
commutes over matrix multiplication for all x, cov(h„(z„)) 
must be equal for all n. However, this equality generally does 
not hold due to oversampling. Also, the covariance matrix 

'Sometimes, affine estimators x(y) = Ay + b are considered to be lineal' 
as well. However, as we will concern ourselves with unbiased estimators, b 
would turn out to be necessarily zero. 



• 



Fig. 3. The best linear unbiased esti mato r (BLUE) is computed for different 
choices of x (holding y fixed), using )42t . In this example, K = 3, M = 2, 
and 0-2=0"^= 0.25. 



clearly depends on x when cov(h„(z„)) is nonzero for some 
n. When the covariance matrix Ay is a scalar matrix, the 
BLUE estimator is equal to XL(y). 

To conclusively demonstrate that the BLUE is not a valid 
estimator, the estimator is computed for a fixed value of y 
and varying x; the results are shown in Figure [3] Clearly, 
since the estimates of Xk vary depending on the value of x 
used in (l42l i. the estimator is not valid. Thus, an MSE-optimal 
linear estimator does not exist for this problem, and we will 
utilize the estimator in ( l40l i. 

V. ML Estimation 

Given a semilinear model as in dS), we would not expect 
the optimal MMSE estimator to have a linear form. Indeed, 
as shown in the previous section, for most signal models and 
priors on the jitter, the BLUE does not even exist. To improve 
upon linear estimation, and reduce the MSE, we move to 
maximum likelihood estimation. 

Consider the problem of maximizing the likelihood function 
in (O; since the logarithm is an increasing function, we can 
perform the optimization by maximizing the log-likelihood: 



AT-l 

XML(y) = argmax ^ logp(2/„;x) 

n— 



(44) 



However, since the marginal pdf does not have a closed 
form, and neither do its derivatives, performing the necessary 
optimization is difficult. Numerical techniques may be applied 
directly to (l44l i. and various general-purpose methods have 
been studied extensively throughout the literature. An iterative 
joint maximization method proposed in |23| attempts to ap- 
proximate the ML estimate by alternating between maximizing 
p{z I y;x) with respect to z and p{y \ z;x) with respect to 
X. One method that explicitly takes advantage of the special 
structure in ( l44l i is the EM algorithm. 

A. ML Estimation using the EM Algorithm 

Consider the function Q (x; xf^'^^^) in (fT9] l. The expression 
for logp(2/„, 2;„; x) is in (|36] |. and summing these together 
(without the minus sign) gives 



1 



1 



logp(y,z;x) = --^||y-H(z)x||^-— ^||z||^+constants 



2al 



2a? 



(45) 
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Expanding and substituting into the expectation in ( fT9] l yields 



Q(x;x(»-i)) 



-1 

W 

1 



T 

y y 



2y^E 



H(z)|y;x 



(t-i) 



tE 



T 
Z Z 



y;x 



(»-i) 



constants. 



(46) 



We want to find the value of x that maximizes this expres- 
sion. Noticing that ( |46] ) is quadratic in x, the candidate value 
X satisfies the linear system 

T 



E 



H(z)^H(z) 



y;^^'- 



(r-l) 



E H(z) |y;x^' ^' y. 
(47) 

Also, the Hessian matrix is negative-definite, so (|46] | is strictly 
concave, and the candidate point x is the unique maximum 



Algorithm 1 Pseudocode for the EM algorithm for computing 
ML estimates for the unknown signal parameters x. 

'-im^'^f^^k^fy:'M I, J, s, e 

Compute J-term quadrature rule (using e.g., the eigende- 
composition method in [16J ) for use in below approxima- 
tions (use J — 100). 
repeat 

i ^ i + 1 

tor n^Oto N ~1 do 

Approximate p(j/„;x('~^^) using ( |30] |. 



Compute E h„(z„)h^(z„) | y„;x 



v(^-i) 



using ( |52] |. 
using (|53] |. 



>(«) 



All that remains to specify the EM algorithm is to 



approximate the expectations in ( |47] i. 

Using Bayes' rule and the separability of bothp(y, z; x) and 
p(y; x), the posterior distribution of the jitter is also separable: 



Approximate E [H(z) | y;x 
end for 
Solve for x*^') using ( |47] l and the above approximations. 

until i = I or ||xW - x(*-i)||2 < (5 or |log^(xW;y) - 
log£(x(*-i);y)| <e 
return x''^ 



p\z 



y;x^'- 



PiVn I 2n;X 



(«-l) 



PiVn 



x(*-i)^ 



N-1 

n 

n=0 
N-1 

r!=0 



) P{Zn) 



(48) 



(49) 



Thus, the expectations are also separable into univariate ex- 
pectations: 



N-1 



E 



H(z)^H(z)|y;x' 



{z-l) 



E E [hn(^ 



n=0 



E 



H(z) 



y;x(*- 



E 






)hl{Zn 



;x(- 



(51) 

The subscript after the left-side expectation in ( fSTT l denotes 
the nth (zero-indexed) row of the matrix. The distribution 
p(y„;x('~^)) is constant with respect to z„, and can be 
evaluated using quadrature, as in ( l30l l. Approximating each of 
the univariate expectations in (ISOl l and (ISTT i with quadrature 
yields 



VI. Simulation Results 

The objectives of the simulations presented here are to 
(a) analyze the behavior of the proposed EM algorithm for 
approximating the ML estimator, and to (b) compare the 
performance of this estimator to both the Cramer-Rao bound 
and that of linear parameter estimation. The convergence 
behavior is studied in detail in lfT4ll for periodic bandlimited 
^.5j^('-i^ gnals. In this work, experiments to determine convergence 
behavior and sensitivity to initial conditions are conducted 
(50) for the sine basis signal model described in Section U In 
all experiments, we utilize MATLAB to generate signals with 
pseudo-random parameters and noise and apply the algorithms 
described to the samples of these signals. For a factor of M 
oversampling, we generate N = K M samples. 



A. Convergence Analysis 



E 



H(z)^H(z) 



y;x(^- 



JV-1 



E 



H(z) 



y;x(-i) 



;^^p(y„;x(*-i)) 



J 

■E 



E^j 



p(?/«;x 



i^-m 



While guaranteed to converge, the EM algorithm would 

be of little use if it did not converge quickly. The rate of 

convergence of the EM algorithm is studied for several choices 

h„(2:j)l^(/5^|)ar^^flnldz^;J^anc^jreiftR;dare presented in Figure g] The 

rate of convergence is exponential, and the rate decreases with 

^-'^> increasing M, increasing az, and decreasing ct^. 

„ /As mentioned in Section HIl the rate of convergence of the 

Wih„(zj)p (^J/»|7Mjal|orithiii- is related to the difference between the CRBs 



(53) 



The complexity of each iteration of this algorithm appears 
to be linear in the number of samples, although the rate of 
convergence (and thus, the number of iterations required) may 
also vary with the number of samples, or other factors. The 
convergence rate, as well as susceptibility to initial conditions 
(since the EM algorithm only guarantees local convergence), 
are the subject of simulations in this work and in |14i. 

The EM algorithm for ML estimation is summarized in 
Algorithm [T] 



of the complete and the incomplete data. As shown later in 
Figure |6] the difference between the CRBs for the complete 
data and incomplete data increases exponentially with CTz. 
This relationship coincides with the convergence behavior 
observed in Figure |4b] Although these experiments evaluated 
500 iterations of the EM algorithm, the results suggest that 
100 iterations would suffice as long as the jitter standard 
deviation a^ is not too large. Also, 10^^ is chosen as a 
reasonable stopping criterion for change in x^'^ and change 
in log-likelihood between iterations {5 and e in Algorithm [T] 
respectively). 




200 300 

# of iterations 

(a) K = 10, o-z = 0.25, a^ = 0.1, M varies. 



500 













°o* 


oo 

qOO 





OD 


^ 












o 



(a) K = 10, (7z = 0.25, an, = 0.25, and varying M. 
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(b) K = 10, M = 4, o-» = 0.1, CTz varies. 
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(c) K = 10, M = 4, cTz = 0.25, o-,„ varies. 



Fig. 4. The convergence of the EM algorithm depends on the choice of 
parameters A/, a^, and a-w, as demonstrated in the above plots. 



B. Sensitivity to Initial Conditions 

The likelihood function described in (fTTT l is generally 
nonconcave, so maximizing the function via a hill-climbing 
method like the EM algorithm is only guaranteed to yield 
a local maximum. The ability of the algorithm to converge 
to the global maximum depends on the nonconcavity of the 
likelihood function. To demonstrate the sensitivity of the EM 
algorithm, as a function of M, az, and (Jiu, the empirical 
distribution of the log-likelihood of the optimal values reached 
from multiple initial conditions is evaluated over numerous 
trials for different choices of these parameters. In this exper- 
iment, the true value of x, the no-jitter linear estimator JAli . 
X = 0, and ten random choices, are used as initial conditions 
for each trial. As suggested by the spread of the samples shown 
in Figure |5] the variability of the EM algorithm increases 
with a 2 and decreasing a^^. Even when the EM algorithm 
appears sensitive to initial conditions, using the no-jitter linear 
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(c) K = 10, M = 8, cTz = 0.25, and varying ct™. 

Fig. 5. The effects of varying initial conditions of the EM algorithm as a 
function of |(a)| oversampling factor, [(b)] jitter variance, and |(c)| additive noise 
variance are studied by computing the log-likelihoods of the EM algorithm 
results, for multiple initial conditions, across 50 trials. The log-likelihood of 
the EM algorithm results are displayed relative to the result for zero-jitter 



initialization, so that the log-likelihood of the result for x'"^ 
zero. 



H(0)V is 



estimate (l4TT l results in a relatively small deviation from the 
best observed log-likelihood value. In situations when such 
initialization does fail to produce consistent results, methods 
such as the deterministic annealing EM algorithm described 
in ll24l may improve consistency. 



C. Performance of the EM Algorithm 

In the first performance experiment, the Cramer-Rao lower 
bound is compared to the unbiased linear estimator ( |40|) 
and the EM algorithm of the ML estimator to measure the 
efficiency of the algorithms. The Cramer-Rao lower bound 
for the complete data is also presented for reference. Although 
computational difficulties prevent a complete comparison for 
every possible value of x, carrying out a comparison for 
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(a) K = 10, M = 4, 0.01 < ctz < 0.5, ct™ = 0.05. 
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Fig. 6. The approximate performance.s of the linear unbiased estimator and 
ML estimator (EM algorithm) are plotted in |(a)| against the CRBy(x) and 
the complete data CRBy,z(x), as a function of a^ for K = 10, M = 16, 
(Tm = 0.05, and a fixed random choice of x. The hnear and ML estimator 
biases are plotted in |(b)| using the root-mean- squared (RMS) values of the 
bias vectors. The bars above and below each data point for the linear and ML 
estimators in|(a)|deHneate the 95% confidence intervals for those data points. 



a few randomly chosen values of x provide a measure of 
the quality of the algorithms. As the curves in Figure |6] 
demonstrate for one such random choice of x, both algorithms 
are approximately efficient for small CTz, but the EM algorithm 
continues to be efficient for larger values of az than the linear 
estimator. In addition, the bias shown for the linear and ML 
estimators is approximately the same for a^ < <Jw', this bias 
may be due to the small error in the numerical integration. 
Note how this error becomes larger with a^ ■ 

In Figure |2l the EM algorithm is compared against two 
linear estimators. First, to demonstrate the MSE improvement 
attainable through nonlinear estimation, the EM algorithm is 
pitted against the linear unbiased estimator. Since a major mo- 
tivating factor for developing these algorithms is to reduce the 
power consumption due to clock accuracy, the EM algorithm 
also can achieve the same MSE as the linear estimator for a 
substantially larger jitter variance, reducing the clock's power 
consumption. 

When the additive noise dominates the jitter (ctz ^ (Jw), the 
improvement can be expected to be minimal, since the system 
is nearly linear, and the jitter is statistically insignificant. As 
the amount of jitter increases, the density function p{z \ y; x) 
used in each iteration of the EM algorithm becomes more 
nonlinear in z, and the quadrature becomes less accurate 
for a given number of terms. Therefore, the EM algorithm 
generally takes longer to converge, and the result should be 
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(c) K = 10, M = 16, 0.01 < o-z < 0.5, a^u = 0.01. 

Fig. 7. The MSE performance of the ML estimator (EM algorithm) is 
compared against both the unbiased linear estimator j40) and the no-jitter 
BLUE )4H . as a function of a^. The bars above and below each data point 
for the linear and ML estimators delineate the 95% confidence intervals for 
those data points. 



a less accurate approximation to the true ML estimator. This 
behavior is observed in Figure |2l where the EM algorithm 
is compared against both the linear unbiased estimator and 
the no-jitter linear estimator The EM algorithm generally has 
lower MSE than either linear estimator, and the performance 
gap is more pronounced for higher oversampling factor M. 

To answer the question of how much more jitter can be 
tolerated for the same desired MSE using the EM algorithm, 
the maximum proportional increase is plotted as a function of 
M and cr^ in Figure [8] The maximum proportional increase 
for a choice of AI and cr^ is computed by approximating log- 
log domain MSE curves, like those in Figure |7] with piece- 
wise linear curves and interpolating the maximum distance 
between them over the range of a^ > cTw- The range of 
c^z < fw is ignored since the linear and nonlinear reconstruc- 
tions perform similarly when the additive noise dominates (as 
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Oversampling factor (M) 
(a) K = 10, M varies, 0.01 < (t^ < 0.5, a^ = 0.1. 




(b) K = 10, M = 8, 0.01 < o-z < 0.5, o-„ varies. 

Fig. 8. These graphs show the maximum factor of improvement in jitter 
tolerance, measured by a^ , achievable by the EM algorithm (relative to linear 
reconstruction). Holding (t,„ fixed, [(a)] shows the trend in maximum improve- 
ment as M increases, and |(b)| shows the trend in maximum improvement 
as (Tu, increases while holding A/ fixed. The jitter standard deviation a* 
con'esponding to this maximum improvement for the ML estimator is plotted 
on the same axes. 



expected). The proportion of improvement increases linearly 
as M increases. As cTiu increases, the level of improvement 
stays approximately the same for a^j < ctz- However, when 
aw increases beyond CTz, the level of improvement decreases 
substantially as expected, since the additive noise dominates, 
and the optimal estimator is approximately linear A maximum 
az improvement factor of two corresponds to power savings 
of up to 75 percent. 

VII. Conclusion 

The results presented in Section |VT] are very encouraging 
from a power-consumption standpoint. A maximum improve- 
ment of between 1.4 to 2 times the jitter translates to a 
two-to-fourfold decrease in power consumption by the clock, 
according to (|3]l. To put the magnitude of such an improvement 
in context, consider the digital baseband processor for ultra- 
wideband communication in |25|. This processor incorporates 
an ADC and a PLL, which consume 86 mW and 45 mW, 
respectively, out of a 271 mW budget for the chip. Reducing 
by a factor of two the power consumed by the ADC alone 
would decrease the total power consumption of the chip by 
almost sixteen percent. 

While effective, the EM algorithm is computationally ex- 
pensive. One benefit of digital post-processing is that these 
algorithms can be performed off-chip, on a computer or other 
system with less limited computational resources. For real- 
time on-chip applications, Kalman filter-like versions of the 



EM algorithm would be more practical; this extension is a 
topic for further investigation. Related to real-time processing 
is developing streaming algorithms for the infinite-dimensional 
case, extending this work for general real-time sampling 
systems. Another future direction involves modifying these 
algorithms for correlated or periodic jitter 

Sampling jitter mitigation is actually just one application 
of these new algorithms. In the frequency domain, jitter maps 
to uncertainty in frequency; using algorithms such as these 
should produce more reliable Fourier transforms for systems 
like spectrum analyzers. In higher dimensions, timing noise 
becomes location jitter in images or video. Greater tolerance 
of the locations of pixels in images would allow scanning 
electron microscope users to acquire higher resolution images 
without sacrificing MSE. This paper shows that significant im- 
provements over the best linear post-processing are possible; 
thus, further work may impact these and other applications. 
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